700 research outputs found

    Faster algorithms for 1-mappability of a sequence

    Full text link
    In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where {\sigma} is the alphabet size

    Enriching rare variants using family-specific linkage information

    Get PDF
    Genome-wide association studies have been successful in identifying common variants for common complex traits in recent years. However, common variants have generally failed to explain substantial proportions of the trait heritabilities. Rare variants, structural variations, and gene-gene and gene-environment interactions, among others, have been suggested as potential sources of the so-called missing heritability. With the advent of exome-wide and whole-genome next-generation sequencing technologies, finding rare variants in functionally important sites (e.g., protein-coding regions) becomes feasible. We investigate the role of linkage information to select families enriched for rare variants using the simulated Genetic Analysis Workshop 17 data. In each replicate of simulated phenotypes Q1 and Q2 on 697 subjects in 8 extended pedigrees, we select one pedigree with the largest family-specific LOD score. Across all 200 replications, we compare the probability that rare causal alleles will be carried in the selected pedigree versus a randomly chosen pedigree. One example of successful enrichment was exhibited for gene VEGFC. The causal variant had minor allele frequency of 0.0717% in the simulated unrelated individuals and explained about 0.1% of the phenotypic variance. However, it explained 7.9% of the phenotypic variance in the eight simulated pedigrees and 23.8% in the family that carried the minor allele. The carrier’s family was selected in all 200 replications. Thus our results show that family-specific linkage information is useful for selecting families for sequencing, thus ensuring that rare functional variants are segregating in the sequencing samples

    SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Illumina's second-generation sequencing platform is playing an increasingly prominent role in modern DNA and RNA sequencing efforts. However, rapid, simple, standardized and independent measures of run quality are currently lacking, as are tools to process sequences for use in downstream applications based on read-level quality data.</p> <p>Results</p> <p>We present SolexaQA, a user-friendly software package designed to generate detailed statistics and at-a-glance graphics of sequence data quality both quickly and in an automated fashion. This package contains associated software to trim sequences dynamically using the quality scores of bases within individual reads.</p> <p>Conclusion</p> <p>The SolexaQA package produces standardized outputs within minutes, thus facilitating ready comparison between flow cell lanes and machine runs, as well as providing immediate diagnostic information to guide the manipulation of sequence data for downstream analyses.</p

    Scalable Transcriptome Preparation for Massive Parallel Sequencing

    Get PDF
    Background: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. Methodology: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was compared to the standard manual sample preparation. Conclusion/Significance: The automated procedure was used to generate libraries for gene expression profiling on the Illumina HiSeq 2000 platform with the capacity of 12 samples per preparation with a significantly improved throughput compared to the standard manual preparation. The data analysis shows consistent gene expression profiles in terms of sensitivity and quantification of gene expression between the two library preparation methods

    Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

    Full text link
    Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

    Next-generation sequencing of common osteogenesis imperfecta-related genes in clinical practice

    Get PDF
    Next generation sequencing (NGS) is a rapidly developing area in genetics. Utilizing this technology in the management of disorders with complex genetic background and not recurrent mutation hot spots can be extremely useful. In this study, we applied NGS, namely semiconductor sequencing to determine the most significant osteogenesis imperfecta-related genetic variants in the clinical practice. We selected genes coding collagen type I alpha-1 and-2 (COL1A1, COL1A2) which are responsible for more than 90% of all cases. CRTAP and LEPRE1/P3H1 genes involved in the background of the recessive forms with relatively high frequency (type VII and VIII) represent less than 10% of the disease. In our six patients (1-41 years), we identified 23 different variants. We found a total of 14 single nucleotide variants (SNV) in COL1A1 and COL1A2, 5 in CRTAP and 4 in LEPRE1. Two novel and two already well-established pathogenic SNVs have been identified. Among the newly recognized mutations, one results in an amino acid change and one of them is a stop codon. We have shown that a new full-scale cost-effective NGS method can be developed and utilized to supplement diagnostic process of osteogenesis imperfecta with molecular genetic data in clinical practice

    Assessing Matched Normal and Tumor Pairs in Next-Generation Sequencing Studies

    Get PDF
    Next generation sequencing technology has revolutionized the study of cancers. Through matched normal-tumor pairs, it is now possible to identify genome-wide germline and somatic mutations. The generation and analysis of the data requires rigorous quality checks and filtering, and the current analytical pipeline is constantly undergoing improvements. We noted however that in analyzing matched pairs, there is an implicit assumption that the sequenced data are matched, without any quality check such as those implemented in association studies. There are serious implications in this assumption as identification of germline and rare somatic variants depend on the normal sample being the matched pair. Using a genetics concept on measuring relatedness between individuals, we demonstrate that the matchedness of tumor pairs can be quantified and should be included as part of a quality protocol in analysis of sequenced data. Despite the mutation changes in cancer samples, matched tumor-normal pairs are still relatively similar in sequence compared to non-matched pairs. We demonstrate that the approach can be used to assess the mutation landscape between individuals

    Midgut microbiota of the malaria mosquito vector Anopheles gambiae and Interactions with plasmodium falciparum Infection

    Get PDF
    The susceptibility of Anopheles mosquitoes to Plasmodium infections relies on complex interactions between the insect vector and the malaria parasite. A number of studies have shown that the mosquito innate immune responses play an important role in controlling the malaria infection and that the strength of parasite clearance is under genetic control, but little is known about the influence of environmental factors on the transmission success. We present here evidence that the composition of the vector gut microbiota is one of the major components that determine the outcome of mosquito infections. A. gambiae mosquitoes collected in natural breeding sites from Cameroon were experimentally challenged with a wild P. falciparum isolate, and their gut bacterial content was submitted for pyrosequencing analysis. The meta-taxogenomic approach revealed a broader richness of the midgut bacterial flora than previously described. Unexpectedly, the majority of bacterial species were found in only a small proportion of mosquitoes, and only 20 genera were shared by 80% of individuals. We show that observed differences in gut bacterial flora of adult mosquitoes is a result of breeding in distinct sites, suggesting that the native aquatic source where larvae were grown determines the composition of the midgut microbiota. Importantly, the abundance of Enterobacteriaceae in the mosquito midgut correlates significantly with the Plasmodium infection status. This striking relationship highlights the role of natural gut environment in parasite transmission. Deciphering microbe-pathogen interactions offers new perspectives to control disease transmission.Institut de Recherche pour le Developpement (IRD); French Agence Nationale pour la Recherche [ANR-11-BSV7-009-01]; European Community [242095, 223601]info:eu-repo/semantics/publishedVersio
    corecore